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Abstract — In this paper, we consider the problem of itera- 
tive detection and decoding (IDD) for multi-antenna systems 
using low-density parity-check (LDPC) codes. The proposed IDD 
system consists of a soft-input soft-output parallel interference 
(PIC) cancellation scheme with linear minimum mean-square 
error (MMSE) receive filters and two novel belief propagation 
(BP) decoding algorithms. The proposed BP algorithms exploit 
the knowledge of short cycles in the graph structure and the 
reweighting factors derived from the hypergraph's expansion. 
Simulation results show that when used to perform IDD for 
multi-antenna systems both proposed BP decoding algorithms 
can consistently outperform existing BP techniques with a small 
number of decoding iterations. 

L Introduction 

Multi-input and multi-output (MIMO) systems can support 
several independent data streams, resulting in a significant 
increase of the system capacity [1|. In order to separate the 
data streams and mitigate the interference between them, a 
detection algorithm must be employed at the receiver In the 
last decade or so, a great deal of effort has been devoted to 
the development of detection algorithms and their integration 
with channel decoding techniques ll2l- |[T2ll . In this context, 
MIMO systems with joint detection/decoding have been shown 
to produce excellent results, approaching the performance of 
an interference free scenario. In a system with joint detec- 
tion/decoding an ideal receiver is comprised of two compo- 
nents: an efficient soft-input soft-output (SISO) MIMO signal 
detector and a SISO decoder with low delay. Specifically, the 
estimated log likelihood ratios associated with the encoded 
bits are computed by the detector and these estimates will 
serve as input to the decoder Then in the second phase of the 
detection/decoding iteration, the decoder generates a posteriori 
probabilities for encoded bits of each data stream. As a result, 
the soft estimate of the transmitted symbol is obtained which 
can facilitate the detection in the first phase of the next 
outer iteration. The joint process of detection/decoding is then 
repeated in an iterative manner until the maximum number of 
iterations is reached. However, in practice there are many open 
issues for such an IDD scheme, e.g. severe detection/decoding 
delay especially for codes with short block lengths ||4], JS], or 
prohibitively high computational complexity associated with 
IDD systems in general. 

Low-density parity-check (LDPC) codes, invented by Gal- 
lager |[T3l are a class of linear block codes which can achieve 
near-Shannon capacity with linear-time encoding and paral- 
lelizable decoding algorithms. The standard BP algorithm is 
well-known as the most effective algorithm to decode LDPC 



codes |[T4ll . and has been widely employed as part of IDD 
schemes for MIMO systems ID, ||7| and ifTSl . It can produce 
exact inference solutions only if the graphical model does 
not contain short cycles. With the existence of cycles, the 
standard BP algorithm has a number of shortcomings, such as 
convergence to a codeword is not guaranteed and convergence 
to a codeword can take many iterations, especially at low 
signal to noise ratios (SNR), which significantly deteriorate 
the decoding performance and cause unexpected transmission 
delay. Due to this fact, many applications of LDPC-coded 
MIMO systems have a performance degradation at some ex- 
tent. In ifTSl , the authors converted the problem of finding the 
fixed points of BP algorithms into that of solving a variational 
problem, and defined a set of reweighting factors. Recently, 
Wymeersch et al. fTJ] extended the use of reweighted BP 
algorithm from pairwise graphs to hypergraphs and reduced 
the set of reweighted parameters to a constant value, whereas 
Liu and de Lamare ifTSl considered two possible values. 

In this paper, we develop an efficient IDD scheme for 
MIMO systems operating in a spatial multiplexing config- 
uration with a reduced complexity and a low delay. The 
proposed scheme consists of a SISO parallel interference 
cancellation (PIC) scheme with linear minimum mean-square 
error (MMSE) receive filters and two novel knowledge-aided 
(KA) belief propagation (BP) decoding algorithms. The first 
KA decoding algorithm is termed cycles knowledge-aided 
reweighted BP (CKAR-BP) algorithm, whereas the second 
KA decoding techniques is called expansion knowledge-aided 
reweighted BP (EKAR-BP) algorithm. In the following, we 
present an IDD scheme for MIMO systems equipped with the 
proposed KA BP algorithms which can considerably improve 
the performance of existing schemes. The proposed CKAR-BP 
decoder takes advantage of the cycle distribution of the Tanner 
graph, while the proposed EKAR-BP decoder first expands 
the original graph into a number of subgraphs then locally 
optimizes the reweighting parameters. Incorporated with a 
SISO PIC-MMSE detector, both CKAR-BP and EKAR-BP 
algorithms are shown to outperform the standard BP and the 
uniformly reweighted BP (URW-BP) ifTTl algorithms when 
performing IDD for MIMO systems. 

The organization of this paper is as follows: Section II 
introduces the system model. In Section III, the proposed 
EKAR-BP and CKAR-BP algorithms are explained in detail. 
Section IV shows the simulation results along with discus- 
sions. Finally, Section V concludes the paper. 



II. System Model 

Let us consider a narrowband MIMO system with Nt 
transmit antennas and Njf receive antennas {N^ > Nt). The 
MIMO system operates in a spatial multiplexing configuration 
and transmits data over flat fading channels. The received data 
after demodulation, matched filtering and sampling is collected 
in a vector r g C^^ ^ ^ with sufficient statistics for detection 
and given by 

r = Cs + n, (1) 

where C S C^^'^^^ is the channel matrix, s e C^^''^ is the 
encoded data vector and n G C^" ^ ^ is the noise vector with 
zero mean and power ct^ elements. In what follows, we assume 
that the receiver has perfect knowledge of the channel matrix 
C In practice, an estimation algorithm must be employed to 
compute the parameters of C ifTOl . |[T2l . 

A. PIC-MMSE Detection Algorithm 

In a SISO PIC-MMSE detection algorithm, the estimates 
of the transmitted symbols are obtained based on the a priori 
log-likelihood ratios (LLRs) obtained from the LDPC channel 
decoder These "soft" estimates are extracted from the received 
vector to perform interference cancellation for a MIMO sys- 
tem. The remaining noise-plus remaining interference terms 
are then equalized by a linear MMSE receive filter which is 
followed by the computation of the a posteriori LLRs of the 
individual constituent bits. The SISO PIC-MMSE algorithm 
used as an outer component is detailed in the following. 

According to the SISO model in |J2, when processing the 
fcth stream, a PIC detector cancels the interference of all other 
streams (q ^ k) such that 






CkSk+n, Vfc 



(2) 



where yq,q ^ k are the estimates of the transmitted co- 
channel symbols obtained from the channel decoder which 
are computed according to j/^ = E[yq] = J2aeo -^1% ^ ^]'^ 
where P[yq = a] corresponds to the a priori probability of 
the symbol a on the constellation map O. The term Ck is the 
fcth column of the channel matrix C and n is the noise-plus- 
remaining-interference vector to be equalized by linear MMSE 
receive filters as 



ijk = w"fk = w^CkSk + w^fi, 



(3) 



in which '(■)^' denotes the Hermitian transpose 
and the MMSE receive filter is given by 
Wk = EsC^{CAkC" +NolNn), where E, is the 
transmission energy and A^ £ (^^^t'^Nr j^ ^ diagonal matrix 
whose entries are the variances of the estimation errors. 

B. Iterative Detection and Decoding 

A block diagram of the IDD system employed in this work 
is depicted in Fig. [T] With the PIC-MMSE processing, we set 
Vk = Sk + "-eff at the output of the detector, where rieff is the 
effective noise factor after the MMSE filtering. By assuming 
that the output of the fc-th layer yk is statistically independent 



LDPC 

Encoder 



iitij] 



Nr 



'i[ij1 



KA-BP 
Decoder 



H',] 



Fig. 1. Iterative LDPC-coded MIMO spatial multiplexing system 
with a SISO PIC-MMSE detector and the proposed KA-BP decoders. 



from the other layers [|2], this leads to the approximation of 
the log-likelihood ratio (LLR) of bit Xk.j 

P{xk,j = +l\yk) 



Ai[xfe,j] «log- 



>^l[Xk.j] +-^2!^*:,^]' (4) 



Pixk^j = ~l\yk) 

where the last term represents the a priori information for 
the coded bits Xk.j, which is obtained by the LDPC decoder 
The first term Ai denotes the extrinsic information which is 
computed based on r and the a priori information X^. For the 
detector, by relaxing the stream index fc, the coded bit extrinsic 
LLR is obtained as 



Xi[xj] =loj 



^a,eA+ P(y\^ = ac) exp(ia(ac)) 



(5) 



where Aj and A, 



denotes the subsets of constellation A 
where the bit Xj takes the values 1 and 0, respectively. The 
value La{ac) denotes the a priori symbol probability for 
symbol Oc and 
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For an IDD scheme, the computed Ai is fed to the LDPC 
decoder as the a priori information. The LDPC decoder 
calculates the a posteriori LLR of each code bit as will be 
detailed later 

III. Knowledge-Aided Decoding Algorithms for 
IDD Schemes 

The proposed CKAR-BP and EKAR-BP algorithms are 
designed to improve the convergence behaviour of the standard 
BP algorithm by reweighting part of the hypergraph. These 
algorithms take the short cycles into account, such that the 
decoder can generate more accurate marginal distributions 
corresponding to coded data. The reweighting strategy was 
first employed in the tree-reweighted BP (TRW-BP) algorithm 
reported in lfT6l . where the authors reformulated the BP decod- 
ing problem into a tractable convex optimization problem that 
iteratively computes beliefs and factor appearance probabilities 
(FAPs). Later with the same concept but additional constraints. 



the unifonnly reweighted BP (URW-BP) algorithm ifTTl was 
introduced for which the FAPs were constrained to be a 
constant. A disadvantage of URW-BP is that it can only 
be applied to regular LDPC codes. Compared to those two 
methods, CKAR-BP and EKAR-BP algorithms optimize the 
FAPs off-line by relaxing the constraints from llT6l and ifTTI . 
Additionally, neither of them impose extra computational com- 
plexity to online decoding. Next, we present general message 
passing rules for reweighted BP algorithms, then elaborate 
both CKAR-BP and EKAR-BP decoders. 

A. Message Passing Rules for Knowledge-Aided Decoders 

The message passing rules of reweighted BP algorithms 
are briefly reviewed here, the derivation of which can be 
found in lfT6l with pairwise interactions and in ifTTI with 
higher-order interactions. Given a hypergraph having N vari- 
able nodes and M check nodes and the reweighting vector 
p ~ [pi,p2, ■ . ■ , Pm], the message from the j-th variable node 
Sj to the i-th check node Ci is given by 

*j* == Ainj + Y^ Pi,Ki,j - (1 - p^)Kij, (7) 

where i' G J\f{j)\i is the neighboring set of check nodes of Sj 
except Ci. Since all messages are represented in LLRs, Ain.j 
is equal to li[xj in the first decoding iteration. We use A^ 
to denote messages sent from q to Sj in previous decoding 
iterations, then for check nodes q A,„,i is updated as 



TABLE I 
Algorithm Flow of CKAR-BP Decoder 



*, 



Ay=2tanh"^( J| tanh^^), 

j'eAf{i)\j 



(8) 



where 'tanh(-)' denotes the hyperbolic tangent function as 
in the standard BP message passing rule to compute an LLR 
message from check node Cj to variable node Sj. Finally, we 
have the belief b{xj) with respect to Xj given by 

b{xj) = Xin,j + ^ Pi Ay. (9) 

The proposed KA-BP decoders iteratively employ (l7]i-(|9]l to 
update the message regarding each node. At the end of 
decoding, ABciicf j serves as the soft output for deciding the 
value of Xj or for generating the extrinsic information l-zixj] 
in the next IDD iteration. Notice that pi = l,Vi corresponds 
to the standard BP algorithm so that no additional complexity 
is introduced due to the presence of p in real-time decoding. 

B. Cycles Knowledge-Aided Reweighted BP (CKAR-BP) 

Given the knowledge of the distribution of cycles in the 
graph, the CKAR-BP algorithm selects the reweighting pa- 
rameters in order to mitigate the effect of short cycles, i.e. the 
statistical dependency among the incoming messages being ex- 
changed by nodes, leading to a situation in which the outgoing 
messages inaccurately have a high reliability or equivalently a 
low quality. The algorithm |[T9| , used for counting short cycles, 
is a matrix multiplication technique which can find the girth g 
implicitly and calculate the number of cycles with length of g, 
g+2 and 5+4, explicitly. As shown in TableJl after running the 
algorithm for counting cycles and calculating p,g the average 



Offline Stage 1: counting short cycles 

1 : Run the algorithm 1191 to count the number of cycles 
with length-!? passing the check node Ci,Vi; 

Offline stage 2: determination of p^ for tlie hypergraph 

2; Determine variable FAPs for each check node: 

if gci < Hg Pi — 1, otherwise pi — pv where pv = 2 /no', 

Onhne Stage: real-time decoding 

3: Update the belief b{xj) iteratively using reweighted 
message passing rules (|7}-(|9]l with optimized 
p = [pi, p2, ■ ■ ■ , Pm]- Decoding stops if Hx^ = or the 
maximum number of decoding iterations is reached. 



number of length-g cycles passing a check node, we determine 
the reweighting parameters pi(i ~ 0, 1, . . . , M — 1) under a 
simple criterion: 



Pt 



1 if gc, < Pg, 
Py otherwise, 



(10) 



where gd is the number of length-g cycles passing a check 
node Ci, pv = 2/nD and tTd is the average connectivity for 
N variable nodes, which is computed by: 

- 1 M 

no — —i = 1 , (11) 

/g v{x)dx N Jg v{x)dx 

where v{x) and v{x) are distributions of the variable nodes 
and the check nodes, respectively. As an improvement to the 
URW-BP algorithm QtI, the proposed CKAR-BP requires 
additional complexity due to the cycle counting algorithm 
|fT9| . Most importantly, CKAR-BP algorithm can improve the 
performance of the BP algorithm when decoding LDPC codes 
with both uniform structures (regular codes) and with non- 
uniform structures (irregular codes). More details of CKAR- 
BP and its applications can be found in il8| . 

C Expansion Knowledge-Aided Reweighted BP (EKAR-BP) 

The proposed EKAR-BP algorithm transforms the original 
hypergraph Q into a set of T > 1 subgraphs and then locally 
optimizes the reweighting parameter vector pj,i = l,2,...,T 
with respect to each subgraph, where the size of the t-th 
subgraph determines the dimension of pj. It should be noted 
that T — \ corresponds to the original TRW-BP algorithm ||16l 
which has a computational complexity of 0{M'^N) and the 
convergence of p is very slow for large graphs. Nevertheless, 
the optimization of p could be significantly less complex when 
more subgraphs are considered (p). Thus, there is a need for 
a flexible method to decompose the original hypergraph into 
subgraphs. Inspired by [|20j , we apply a modified progressive- 
edge growth (PEG) approach to achieve the hypergraph ex- 
pansion. Generally, the number of subgraphs T depends on a 
pre-set maximum expansion level dmax, as a large dmax results 
in a small T but a high probability of existence of very short 
cycles within subgraphs. Compared to the greedy version of 



TABLE II 
Algorithm Flow of EKAR-BP Decoder 



Offline Stage 1: subgraphs formation 

1: Given a hypergraph Q and dmax, apply the modified PEG 
expansion to generate T > 1 subgraphs; 

Offline Stage 2: optimization of p^ for tlie t-tli subgrapli 

2: Initialize pj to a valid value; 

3: For each subgraph, calculate the beliefs b{xt) and 
the mutual information term It = [/t,i, /t,2, ■ ■ • , It,Lt\ 
by using reweighted message passing rule (|7)-(|9); 

4: With b{xt) and It obtained from step 3, update 
PI to PI using the conditional gradient method; 

5: Repeat steps 3-4 until p^ converges for each subgraph; 

Offline Stage 3: choice of p = [pi,p2,. . . , pm] for decoding 

6: For all T subgraphs, collect p^, . . . , p^, . . . , pj,. 

In case of multiple values pt for the same i-th check node, 

choose the one offering the best performance; 

Online Stage: real-time decoding 

7: Update the belief b(xj) iteratively using reweighted 
message passing rules (|7)-(|9} with the optimized 
p — [pi, p2, ■ ■ ■ , Pm]- Decoding stops if Hx^ = or the 
maximum number of decoding iterations is reached. 



PEG 1201 . our modified PEG expansion has two differences: 
(i) the expansion stops as soon as every member of the set of 
nodes Vt has been visited; (ii) the number of edges incident 
to the node Sj might be less than its degree since some short 
cycles are excluded in subgraphs to guarantee that the local 
girth of each subgraph gt is always larger than the global girth 
of the original graph g. 

As shown in Table. [Ill after obtaining T subgraphs, we 
introduce L ~ [Li,L2, ■ ■ ■ ,Lt] in which Lt is the number 
of check nodes in the i-th subgraph. Note that J2t^t> ^'^ 
due to duplicated nodes during hypergraph expansion. With 
the i-th subgraph, we optimize the associated R4Ps Pf = 
[pt,i, Pt,2, ■ ■ ■ , Pt.Lt] using a recursive optimization method, 
similar to TRW-BP fT6l but with higher-order interactions 
and related message passing rules (|7)-(|9). The optimization 
problem is solved recursively as follows: 1) for all T subgraphs 
in parallel and fixed p^ , use message passing rules (|7]i-(|9]l to 
calculate the beliefs b{xt) as well as the mutual information 
term It = [It.i, It.2, ■ ■ ■ , It,Lt] provided with Lt < M check 
nodes in the t-th subgraph; 2) for all T subgraphs in parallel, 
given {Jf}^j, use the conditional gradient method to update, 
for all t, pl X.O PI , then go back to step 1). 

The optimization problem is given by 



P\lt 

■t. PteT(aO, 

where (•)^ denotes matrix transpose, T((?f) is the set of 



mininiizc 

s. 



aU valid FAPs over the subgraph Qt and It.i is a mutual 
information term depending on p). ' , the previous value of 
Pj. By denoting the objective function as /(pj = —pllt, we 
first linearize the objective around the current value Pj : 



/iin(p,) = fipn + ^ifipDiPt - pn, (12) 



,('■)^ 



whereVpJ(p(^)) 



It- Secondly, we minimize /lin(Pt) 



with respect to pj, denoting the minimizer by pj and z. 

max(/iin(Pt),4 

(r+l) 

to Pt as 



(r+l) 



M^ 



where z^ ^ —oo- Finally, Pt' is updated 



in which a is chosen as 



Pt ^ Pt +^{pt- Pt )^ 



arg min /(pI'"' + a{p*t - p[''^)). 
Qe[o.i] 



(13) 



(14) 



At every recursion, /(pj ) is an upper bound on the optimized 
objective, while z. 



(r+l) 



is a lower bound. Note that the 



proposed EKAR-BP algorithm is straightforward to use if the 
LDPC code was designed by PEG, or its variations ||2T1 . 1221 . 
but is not limited to such designs. 

IV. Simulation Results 

In this section, we present the simulation results of the 
proposed IDD scheme with the CKAR-BP and EKAR-BP 
algorithms for a 4 x 4 LDPC-coded MIMO system with PIC- 
MMSE detection. The LDPC code is a regular code designed 
by the PEG algorithm [20J whose block length N is 1000, the 
rate R is 0.5, the girth [g) is 6, and the degree distributions 
are 3(u(.t) = x^) and h{v{x) = x^) respectively. We consider 
uncorrected Rayleigh flat fading channels and used 30 inner 
decoding iterations in this experiment. For the EKAR-BP 
decoder, T = 20 subgraphs have been generated, where check 
nodes are allowed to be re-visited, and 600 recursions were 
employed to obtain p. 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 
mutual information at output of decoder 1^ 

Fig. 2. EXIT charts of different decoders with a PIC detector. The proposed 
EKAR-BP decoder matches better with the PIC detector than other decoders. 
The EXIT chart of the PIC detector is obtained at E^/Nq = 4dB. 



In comparison with the standard BP and URW-BP algo- 
rithms, we first draw an extrinsic information transfer (EXIT) 




the number of expanded subgraphs and the convergence speed 
of the reweighting parameters. Numerical results show that the 
proposed IDD system is able to offer good performance while 
using a reduced number of inner and outer iterations. 
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Fig. 3. Comparison of the standard BP, URW-BP, CKAR-BP and EKAR-BP 
in tenns of BER perfonnances for a 4 X 4 system. 



charts of different decoders with the SISO PIC detector in 
Fig. |2] Although the curve of the PIC-MMSE detector does 
not reach the top-right (1,1) point at the given SNR, it is 
obvious that the combination of PIC-MMSE detector and 
the proposed EKAR-BP decoder creates the widest detection 
and decoding tunnel. Additionally, only the tunnel between 
the PIC-MMSE detector and the standard BP decoder is 
closed at an early stage, which indicates that performance 
gain from the IDD process could be significantly diminished 
in this case. To verify the result of the EXIT chart. Fig. 
[3] depicts the performance in bit-error ratio (BER) of the 
MIMO system. We have used 30 inner decoding iterations 
and up to 3 outer detection and decoding iterations. The 
performance curves after 2 outer iterations are denoted by 
solid lines while the curves after 3 outer iterations are denoted 
by dashed Hnes. From Fig. [3 both CKAR-BP and EKAR-BP 
decoders outperform the standard BP and URW-BP decoder 
in the first detection and decoding iteration. In the third outer 
iteration, the proposed decoders are still able to generate 
relatively good performance when considering the low SNR 
range and the block length of code. Notice that there is an 
error floor effect at the BER of 10^, which can be mitigated 
by using decision feedback techniques, fS], ifTTI and lfT2l . As 
mentioned in Section III, the key feature of the proposed KA- 
BP decoders lies in that no additional complexity is imposed 
in real-time decoding since the optimization of p is carried 
out offline. Moreover, by increasing the number of subgraphs 
T the EKAR-BP can accelerate the optimization process such 
that it can be employed for time-varying channels. 

V. CONCLUSION 

We have proposed an IDD scheme for MIMO systems 
with a conventional PIC-MMSE detector and two novel KA- 
BP decoders, which implement the reweighting strategy for 
decoding finite-length regular or irregular LDPC codes. The 
proposed CKAR-BP and EKAR-BP algorithms have different 
computational costs in the optimization phase, but neither of 
which requires extra complexity for online decoding. Further- 
more, the EKAR-BP algorithm provides a trade-off between 
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